Automatic Tag Attachment Scheme based on Text Clustering for Efficient File Search in Unstructured Peer-to-Peer File Sharing Systems

نویسندگان

  • Ting Ting Qin
  • Satoshi Fujita
چکیده

In this paper, the authors address the issue of automatic tag attachment to the documents distributed over a P2P network aiming at improving the efficiency of file search in such networks. The proposed scheme combines text clustering with a modified tag extraction algorithm, and is executed in a fully distributed manner. Meanwhile, the optimal cluster number can also be fixed automatically through a distance cost function. We have conducted experiments to evaluate the accuracy of the proposed scheme. The result of experiments indicates that the proposed approach is capable of making effective and efficient tag attachment in real scenarios; i.e., for more than 90% of documents, it attaches the same tags as the ones attached by human reviewers. Moreover, it proofs by the experiments that the optimal cluster number is almost the same as the number of topics from the website.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Popularity-Based Replication Strategy in Unstructured P2P File Sharing Systems

Peer-to-Peer (P2P) networks have shown to be an efficient and successful mechanism for file sharing over the internet. The unstructured P2P systems usually use a blind search method to find the requested data object. Observations have shown that a few of peers share most of data. In order to increase the success rate of blind search and data availability and load balancing, replication techniqu...

متن کامل

LightFlood: an Efficient Flooding Scheme for File Search in Unstructured Peer-to-Peer Systems

“Flooding” is a fundamental operation in unstructured Peer-to-Peer (P2P) file sharing systems, such as Gnutella. Although it is effective in content search, flooding is very inefficient because it results in a great amount of redundant messages. Our study shows that more than 70% of the generated messages are redundant for a flooding with a TTL of 7 in a moderately connected network. Existing e...

متن کامل

Efficient Music Genre Retrieval Based on Peer Interest Clustering in P2P Networks

Content-based music retrieval is desirable in Peer-to-Peer (P2P) networks, considering its popularity for users and its ability of semantic search, intensive computing cost raises a barrier to efficiency and scalability though. In this paper, we propose an approach of music genre retrieval based on peer interest clustering. Automatic music feature extraction and adaptive shared music file clust...

متن کامل

P2P Network Trust Management Survey

Peer-to-peer applications (P2P) are no longer limited to home users, and start being accepted in academic and corporate environments. While file sharing and instant messaging applications are the most traditional examples, they are no longer the only ones benefiting from the potential advantages of P2P networks. For example, network file storage, data transmission, distributed computing, and co...

متن کامل

Connectivity Based Node Clustering in Decentralized Peer-to-Peer Networks

Connectivity based node clustering has wide ranging applications in decentralized Peer-to-Peer (P2P) networks such as P2P file sharing systems, mobile ad-hoc networks, P2P sensor networks and so forth. This paper describes a Connectivity-based Distributed Node Clustering scheme (CDC). This scheme presents a scalable and an efficient solution for discovering connectivity based clusters in peer n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. UCS

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2012